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Abstract 

A geodesic is the shortest path between two vertices in a connected network. 
The geodesic is the kernel of various network metrics including radius, diameter, 
eccentricity, closeness, and betweenness. These metrics are the foundation of 
much network research and thus, have been studied extensively in the domain of 
single-relational networks (both in their directed and undirected forms). How- 
ever, geodesies for single-relational networks do not translate directly to multi- 
relational, or semantic networks, where vertices are connected to one another by 
any number of edge labels. Here, a more sophisticated method for calculating a 
geodesic is necessary. This article presents a technique for calculating geodesies 
in semantic networks with a focus on semantic networks represented according 
to the Resource Description Framework (RDF). In this framework, a discrete 
"walker" utilizes an abstract path description called a grammar to determine 
which paths to include in its geodesic calculation. The grammar-based model 
forms a general framework for studying geodesic metrics in semantic networks. 



1. Introduction 

The study of networks (i.e. graph theory) is the study of the relationship 
between vertices (i.e. nodes) as defined by the edges (i.e. arcs) connecting them. 
In path analysis, a path metric function maps an ordered vertex pair into a 
real number, where that real number is the length of the path connecting to 
the two vertices. Metrics that utilize the shortest path between two vertices 
in their calculation are called geodesic metrics. The geodesic metrics that will 
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be reviewed in this article are shortest path, eccentricity [T], radius, diameter, 
betweenness centrality 0, and closeness centrality [3]. 

If G 1 is a single-relational network, then G 1 = (V, E), where V = {£,.. . ,j}, 
is the set of vertices and EC (V x V) is a subset of the product of V . In 
a single-relational network all the edges have a single, homogenous meaning. 
Because an edge in a single-relational network is an element of the product of 
V, it does not have the ability to represent the type of relationships that ex- 
ist between the two vertices it connects. An edge can only denote that there 
is a relationship. Without a distinguishing label, all edges in such networks 
have a single meaning. Thus, they are called single-relational networks^] While 
a single-relational network supports the representation of a homogeneous set 
of relationships, a semantic network supports the representation of a hetero- 
geneous set of relationships. For instance, in a single-relational network it is 
possible to represent humans connected to one another by friendship edges; in a 
semantic network, it is possible to represent humans connected to one another 
by friendship, kinship, collaboration, communication, etc. relationships. 

A semantic network denoted G n can be defined as a set of single-relational 
networks such that G n — (V, E), where E = {E , Ei, . . . , E n } and for any 
Ek E E, Ek C (V x V) 5j. The meaning of a relationship in G n is determined 
by its set E^ E E. Perhaps a more convenient semantic network representation 
and the one to be used throughout the remainder of this article is that of the 
triple list where G n C (V x ft x V) and ft is a set of edge labels. A single 
edge in this representation is denoted by a triple r = where vertex i is 

connected to vertex j by the edge label u. 

In some cases, it is possible to isolate sub-networks of a semantic network 
and represent the isolated network in an unlabeled form. Unlabeled geodesic 
metrics can be used to compute on the isolated component. However, in many 
cases, the complexity of the path description does not support an unlabeled 
representation. These scenarios require "semantically aware" geodesic metrics 
that respect a semantic network's ontology (i.e. the vertex classes and edge 
types) [6 . A semantic network is not simply a directed labeled network. It 
is a high-level representation of complex objects and their relationship to one 
another according to ontological constraints. There exist various algorithms to 
study semantically typed paths in a network [3 El HH1 EH • Such algorithms 
assume only a path between two vertices and do not investigate other features 
of the intervening vertices. The benefit of the grammar-based geodesic model 
presented in this article is that complex paths can be represented to make use 
of path "bookkeeping." Such bookkeeping investigates intervening vertices even 



2 It is noted that bipartite networks allow for more than one edge meaning to be inferred 
because V is the union of two disjoint vertex sets. Thus, edges from set A C V to set 
B C V (such that A n B = 0) can have a different meaning than the edges from B to A. 
Also, theoretically, it is possible to represent edge labels as a topological feature of the graph 
structure 4 . In other words, there exists an injective function (though not surjective) from 
the set of semantic networks to the set of single-relational networks that preserves the meaning 
of the edge labels. 
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though they may not be included in the final path solution. For example, it 
may be important to determine a set of "friendship" paths between two human 
vertices, where every intervening human works for a particular organization and 
has a particular position in that organization. While a set of friendship paths is 
the result of the function, the path detours to determine employer and position 
are not. The technique for doing this is the primary contribution of this article. 

A secondary contribution is the unification of the grammar-based model 
proposed here with the grammar-based model proposed in [12] for calculating 
stationary probability distributions in a subset of the full semantic network 
(e.g. eigenvector centrality [T3] and PageRank [14] ). With the grammar-based 
model, a single framework exists that ports many of the popular single-relational 
network analysis algorithms to the semantic network domain. Moreover, an al- 
gebra for mapping semantic networks to single-relational networks has been 
presented in [15] and can be used to meaningfully execute standard single- 
relational network analysis algorithms on distortions of the original semantic 
network. The Semantic Web community does not often employee the standard 
suite of network analysis algorithms. This is perhaps due to the fact that the 
Semantic Web is generally seen as a knowledge-base grounded in description 
logics rather than graph- or network-theory. When the Semantic Web commu- 
nity adopts a network interpretation, it can benefit from the extensive body of 
work found in the network analysis literature. For example, recommendation 
[T5] , ranking [T7], and decision making [5] are a few of the types of Semantic 
Web applications that can benefit from a network perspective. In other words, 
graph/network theoretic techniques can be used to yield innovative solutions on 
the Semantic Web. 

The first half of this article will define a popular set of geodesic metrics for 
single-relational networks. It will become apparent from these definitions, that 
the more advanced geodesies rely on the shortest path metric. The second half 
of the article will present the grammar-based model for calculating a meaning- 
ful shortest path in a semantic network. The other geodesies follow from this 
definition. 

2. Geodesies in Single-Relational Networks 

This section will review a collection of popular geodesic metrics used to 
characterize a path, a vertex, and a network. The following list enumerates 
these metrics and identifies whether they are path, vertex, or network metrics: 

• in- and out-degree: vertex metric 

• shortest path: path metric 

• eccentricity: vertex metric 

• radius: network metric 

• diameter: network metric 

• closeness: vertex metric 

• betweenness: vertex metric. 
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It is worth noting that besides in- and out-degree, all the metrics mentioned 
utilize a path function p : V x V — > Q to determine the set of paths between any 
two vertices in V, where Q is a set of paths. The premise of this article is that 
once a path function is defined for a semantic network, then all of the other 
metrics are directly derived from it. In the semantic network path function, 
p: VxVxfy^Q returns the number of paths between two vertices according 
to a user-defined grammar 

Before discussing the grammar-based geodesic model for semantic networks, 
this section will review the geodesic metrics in the domain of single-relational 
networks. 

2.1. In- and Out-Degree 

The simplest structural metric for a vertex is the vertex's degree. While this 
is not a geodesic metric, it is presented as the concept will become necessary in 
the later section regarding semantic networks. 

For directed networks, any vertex i € V has both an in-degree and an out- 
degree. The set of edges in E that have i as either its in- or out-edge is denoted 
T~ : V -> E a,nd T+ : V -> E, respectively. If 

T-(i)={(x,y) | (x,y)eE A y = ^} 

and 

T+(i) = {(x,y)\(x,y)eE A x = 1} 

then, is the subset of edges in E incoming to i and r + (i) is the subset of 

edges outgoing from i. The cardinality of the sets is the in- and out-degree of 
the vertex, denoted |r~(z)| and |r + («)|, respectively. 

2.2. Shortest Path 

The shortest path metric is the foundation for all other geodesic metrics. 
This metric is defined for any two vertices i,j € V such that the sink vertex j 
is reachable from the source vertex i in G 1 [18]. If j is unreachable from i, the 
shortest path between i and j is undefined. The shortest path between any two 
vertices i and j in an unweighted network is the smallest of the set of all paths 
between i and j. If p : V x V — >• Q is a function that takes two vertices and 
returns a set of paths Q where for any q G Q, q = (i, . . . , j), then the shortest 
path between i and j is the miniUq^Q \l\ — 1), where min returns the smallest 
value of its domain. The shortest path function is denoted s : V x V —> N with 
the function rule 

s(i,j) = min |J \q\ - 1 J . 

\<?ep(ij) / 

It is important to subtract 1 from the path length since a path is defined as 
the set of edges traversed, not the set of vertices traversed. Thus, for the path 
q = (a, b, c, d), the |g| is 4, but the path length is 3. 
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Note that p returns the set of all paths between i and j. Of course, with the 
potential for loops, this function could return a \Q\ — oo. Therefore, in many 
cases, it is important to not consider all paths, but just those paths that have 
the same cardinality as the shortest path currently found and thus are shortest 
paths themselves. It is noted that all the remaining geodesic metrics require 
only the shortest path between i and j. 

2.3. Eccentricity, Radius, and Diameter 

The radius and diameter of a network require the determination of the ec- 
centricity of every vertex in V . The eccentricity metric requires the calculation 
of \V\ — 1 shortest path calculations of a particular vertex pp. The eccentricity 
of a vertex i is the largest shortest path between i and all other vertices in V 
such that the eccentricity function e : V — > N has the rule 

e(i) — max I s(i,j) 
\iev 

where max returns the largest value of its domain. 

The radius of the network is the minimum eccentricity of all vertices in V 
[IH] . The function r : G -> N has the rule 



r(G 1 ) =min I (J e(i) j 
Kiev / 



Finally, the diameter of a network is the maximum eccentricity of the vertices 
in V 19J. The function d : G -> N has the rule 



d(G 1 ) = max ( [j e(i) 




2.4- Closeness and Betweenness Centrality 

Closeness and betweenness centrality are popular network metrics for deter- 
mining the "centralness" of a vertex. Closeness centrality is defined as the mean 
shortest path between some vertex i and all the other vertices in V O |20l [21] . 
The function c : V — > K denotes the closeness function and has the rule 

c {i) = ^ 1 T- -v 

Betweenness centrality is defined for a vertex in V. The betweenness of 
i G V is the number of shortest paths that exist between all vertices j € V and 
k E V that have i in their path divided by the total number of shortest paths 
between j and k, where i ^ j ^ k [52]. IfcriT^xT^— s-Qisa function that 
returns the set of shortest paths between any two vertices j and k such that 

a(j,k)= |J q:\q\-l = s(j,k) 
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and a:VxVxV^-Qis the set of shortest paths between two vertices j and 
k that have i in the path, where 



a 



(j,k,i)= |J q : (\q\ — 1 = s(j, k) A i € q), 



then the betweenness function b : V — > K has the rule 



6 « = E 



It is worth noting that in [23 , the author articulates the point that the 
shortest paths between two vertices is not necessarily the only mechanism of 
interaction between two vertices. Thus, the author develops a variation of the 
betweenness metric that favors shortest paths, but does not utilize only shortest 
paths in its betweenness calculation. 

3. Semantic Network Grammars 

A semantic network is a directed labeled graph. However, a semantic net- 
work is perhaps best interpreted in an object-oriented fashion where complex 
objects (i.e. multi-vertex elements) are connected to one another according to 
various relationship types. While a particular human is represented by a vertex, 
metadata associated with that individual is represented in the vertices adjacent 
to the human vertex (e.g. the human's name, address, age, etc.). In many in- 
stances, particular metadata vertices are sinks (i.e. no outgoing edges). In other 
cases, the metadata of an individual is another complex object such as the friend 
of that human or the human's employer. 

The topological features of a semantic network are represented by a data 
type abstraction called an ontology (i.e. a semantic network schema). A popular 
semantic network representation is the Resource Description Framework (RDF) 
[23]. RDF Schema (RDFS) is a schema language for developing RDF ontologies 
in RDF [S3] . This article will present all of its concepts from the perspective of 
RDF and RDFS primarily due to the fact that these are standard data models 
with a large application-base. However, these ideas can be generalized to any 
semantic network representation. This is due to the fact that one can remove the 
constraint of using URIs, literals, and blank nodes when labeling vertices and 
edges. When such a constraint is lifted, then a directed, vertex/edge-labeled, 
multi-graph results. In the semantic network literature, such an abstract graph 
type is named a semantic network [2 6) . The first subsection will briefly introduce 
the concept of RDF and RDFS before describing an ontology for designing 
geodesic grammars. 

3.1. Introduction to RDF /RDFS 

The RDF data model represents a semantic network as a triple list where 
the vertices and edges (both called resources) are Uniform Resource Identifiers 
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(URI) [57], blank nodes, or literals. If the set of all URIs is denoted U, the set 
of all blank nodes is denoted B, and the set of all literals is denoted L, then an 
RDF network is the triple list G n such that 

G"C(([/UB)xf/x([/UBU L)). 

The first resource of a triple is called the subject, the second is called the 
predicate, and the third is called the object. A single triple r £ G n is denoted 
as r = (s,p, 6). 

All URIs are namespaced such that the URI http://www.lanl.gov#marko 
has a namespace of http : / / www . lanl . gov# and a fragment of marko. In many 
cases, for document and diagram clarity, a namespace is prefixed in such a 
way that the previous URI is represented as lanl: marko. In this article, the 
namespaces for RDF and RDFS will be prefixed as rdf and rdf s, respectively. 

Blank nodes are "anonymous" vertices and are not discussed in this article 
as they will not directly pertain to any of the concepts presented. Literals are 
any resource that denotes a string, integer, floating point, date, etc. The full 
taxonomy of literal types is presented in |28j . 

In RDFS, every vertex is tied to some platonic category representing its 
rdfs: Class using the rdf : type property. Moreover, every edge label has do- 
main/range restrictions that determine the vertex types that the edge labels 
can be used in conjunction with. Because the instance of an ontology obeys the 
defined constraints of the ontology, the modeler has an abstract representation 
of the topological features of the semantic network instance in terms of classes 
(vertices) and properties (edge labels). For example, 

(lanl : hasFriend, rdfs : domain, lanl : Human) 
(lanl : hasFriend, rdfs : range, lanl : Human) 

states that any resource of type lanl: Human can have a friend that is only of 
type lanl: Human. Therefore, the following three triples are legal according to 
the simple ontology above: 

(lanl :marko, rdf :type, lanl :Human) 

(lanl : j en, rdf : type, lanl : Human) 

(lanl : marko, lanl : hasFriend, lanl : jen). 

However, the three statements 

(lanl : marko, rdf :type, lanl : Human) 

(lanl : fluffy, rdf : type, lanl :Dog) 

(lanl : marko, lanl : hasFriend, lanl : fluffy) 

are not legal according to the ontology because lanl : fluff y is a lanl : Dog and 
a lanl: Human cannot befriend anything that is not a lanl: Human. 

The ontology and legal instance of the previous example are diagrammed 
in Figure [l] However, for the sake of brevity and clarity of the diagram, the 
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domain and range properties of a class can be abbreviated as in Figure [2] The 
abbreviated ontological diagram will be used throughout the remainder of this 
article. It is important to note that both the RDFS ontology and RDF instance 
network are represented in RDF and thus, both instances and ontology are 
contained within a single semantic network. 




Figure 1: The full representation of all triples in the ontology and instance layers of the 
semantic network example. 



lankhasFriend 



lankHuman 



rdf:type 



rdf:type 



ontology 



lantmarko — lanl:hasFriend 



lankjen 



Figure 2: The abbreviated representation of the ontology and instance layers of the semantic 
network example. 



Finally, an important concept in RDFS is rdfs: Class and rdf : Property 
subsumption as denoted by the rdf s : subClassOf and rdf s : subPropertyOf 
predicates, respectively. With the rdf s : subClassOf and rdf s : subPropertyOf 

predicates, it is possible to generate concept hierarchies. For the purposes of 
this article, it is only necessary to understand that subsumption is transitive 
such that if 

(lanl : fluffy, rdf : type, lanl : Dog) 

(lanl : Dog, rdfs : subClassOf, lanl : Mammal) 

(lanl : Mammal, rdfs : subClassOf , lanl : Animal) , 

then it can be inferred that because lanl: fluffy is a lanl: Dog, lanl: fluffy 
is also both a lanl: Mammal and a lanl: Animal. Transitivity exists for the 
rdf s : subPropertyOf predicate as well. 

3.2. Defining a Grammar 

This subsection will define the RDFS ontology for creating a grammar. Any 
user-defined grammar must obey this ontology. The grammar constructed from 
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this ontology determines the meaning of the value returned by a "semantically 
aware" geodesic function. Any grammar instance is denoted \& C ((U x B) x 
U x (U x B x L)). 

The instance of a grammar is represented in RDF and the ontology of the 
grammar is represented in RDFS. Figure [3] diagrams the ontology of the geodesic 
grammar, where edges represent properties whose tail is the domain of the 
property and whose head is the range of the property. Furthermore, the dashed 
edges denote the RDFS property rdf s : subClassOf . 



rwr:Entry 
Context 



rwr:Exit 
Context 



rdf:Bag 



rwr:Context 



~ rwr:hasAttributes 



rwr:Attributes 



rwr:Attribute 



rwr:hasAttribute 



rwr:hasRules 



rwr:forResource 
I 

f — ■ — — 

i rdfs:Resource j 

j 



rdf:Bag 



rdf:Seq 



rwr:Rules 



rdfs:Container 
Membership 
Property 



rwr:NotEver 



rwr:ls 



rwr:Not 



r:steps / 
1/ 



rwr:steps 



rdfs:Literal 



rwr:Traverse 



I 

rwr:hasEdge 



rwr:Rule 



rwnEdge 


rwr:Path 
Count 


* 






rwr:lnEdge 


rwr:OutEdge 



| rwnhasPredicate , rwr:steps 
rwr:hasSubject X rwr:hasObject ± 
r \ 



I rdf:Property | 



rdfs:Literal 



rwr:Context 



rwr:Context 



Figure 3: The ontology for a geodesic path grammar. 



The remainder of this section will present an informal review of the major 
components of the grammar ontology. The next section will formalize all aspects 
of the resources diagrammed in Figure [3j 

Grammar-based geodesies rely on a discrete walker. The walker utilizes a 
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\& grammar to constrain its path through G n . The combination of a walker 
and a 4" is a breadth-first search through a particular sub-network of G n . That 
sub- network is abstractly represented by \E', but not fully realized until after the 
execution of on G n . 

Any f is a collection of rwr: Context resources connected to one another 
by rwr: Traverse resources. Each rwr: Context is an abstract representation 
of a legal step along a path that a walker can traverse on its way from source 
vertex i to sink vertex j. An rwr : Context has an associated rwr : f orResource 
property. The object of that property determines the set of legal vertices that 
that the rwr : Context can resolve to. Only when a walker utilizes a grammar do 
the rwr: Contexts have a resolution to a particular vertex in G n . rwr: Context 
resolution is further constrained by the rwr: Rules and rwr : Attributes of the 
rwr: Context in 4'. 

Two important data structures that are used in a grammar are the rdf : Bag 
and rdf : Seq. An rdf : Bag is an unordered set of elements where each element of 
the rdf : Bag is the object of a triple with predicate rdf : li. An rdf : Seq is an or- 
dered set of elements where each element of the rdf : Seq is the object of a triple 
with a predicate that is an rdf s : subPropertyOf rdf s : ContainerMembershipProperty 
(i.e. rdf :_1, rdf :_2, rdf :_3, etc.). 

There exist two rwr : Rules (an rdf s : subClassOf rdf : Seq): rwr : PathCount 
and rwr : Traverse. The rwr : PathCount rule instructs the walker to record the 
vertex, edge, and directionality in the ordered path set that is ultimately re- 
turned by the grammar-based geodesic algorithm. The rwr: Traverse rule in- 
structs the walker to select some outgoing or incoming edge of its current vertex 
as defined by the set of rwr: Edges associated with the rwr: Traverse rule. If 
more than one choice should exist for the walker, the walker chooses both by 
cloning itself and having each clone take a unique branch of the path. 

There exist three rwr : Attributes (an rdf s : subClassOf rdf :Bag): rwr:NotEver, 
rwr: Is, and rwr : Not. In some instances, when traversing to a new vertex, the 
walker must respect the fact that it has already seen a particular vertex. The 
rwr:NotEver attribute ensures that the resolution of the rwr: Context is not 
a previously seen vertex, thus preventing infinite loops. The rwr: Is attribute 
allows the walker to explore an area around a particular vertex (i.e. other paths 
not directly associated with the return path) while still ensuring that the walker 
returns to the original vertex. Finally, the rwr : Not attribute ensures that the 
walker does not return to a particular previously seen vertex. 

If vertex i is the head of the path (i.e. source), then it is defined in an 
rwr : EntryContext. If vertex j is the tail of the path (i.e. sink), then it is 
defined in an rwr : ExitContext. The purpose of the walker is to move from 
source to sink in G n by respecting the rwr: Rules and rwr : Attributes of the 
rwr : Contexts that it traverses in \E'. Figure [4] diagrams the relationship between 
a walker, its grammar 4", and its network instance G n . The grammar acts as 
a user-defined "program" that the walker executes, where the language of that 
program is defined by the grammar ontology. 

The next section will formalize the grammar. 
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Figure 4: A walker p walks both Nl" and G n 



4. Formalizing the Grammar-Based Model 

Once a grammar has been defined according to the constraints of the on- 
tology diagrammed in Figure [3j the path function /cFxFxf 4(J can be 
executed. The function p returns the set of all paths between any two vertices 
i,j G V. This section will define the rules by which p interprets its domain 
parameters and ultimately derives a path set. 

The grammar-based model requires the walker to query G n such that it can 
determine the set of legal vertices and edges that it can traverse. Moreover, 
the walker must be able to query '5 in order to know which rwr: Rules and 
rwr : Attributes to respect. The mechanism by which the walker queries G n 
and \& is called the symbol binding model. For example, the following query 

X ={?x | (?£, lanl:hasFriend, lanl: jhw) € G n 
A (?£, lanl : worksFor, lanl : LANL) € G"} 

would fill the unordered set X with all people that have lanl : jhw as their friend 
and who work for lanl : LANL. A more advanced query example is 

X ={?x,?y | (?x,lanl:hasFriend,?y) € G n 
A lanl: worksFor, lanl: LANL) £ G n 
A (?x, lanl : worksFor, lanl : PNNL) € G n }. 

In the above query, the set X is an unordered set of ordered pairs of friends 
where one of the friends works at lanl : LANL and the other works at lanl : PNNL. 

4--1- Initializing a Walker p 

The path function p is supplied with a start vertex «, an end vertex j, and 
a grammar '5. Upon the execution of p, a single walker, denoted p, is created 
and added to the set of walkers P, where at n = 0, |P| = 1, and n £ N is in 
discrete time. The set P may increase in size over the course of the algorithm 
as clone particles are created where multiple legal options exist for traversal. 
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Every walker has two ordered multi-sets associated with it: g p and q p . The 
multi-set g p is an ordered set of vertices, edges, and edge directions traversed 
by p, where g p is the vertex location of p at time step n. The element g v n , 
denotes the predicate (i.e. edge label) used by p to traverse to g p and the 
element g p n ,, denotes the directionality of the predicate used in that traver- 
sal. For example, suppose g v — (lanl :marko, lanl :hasFriend, +, lanl: jhw, 
lanl :hasFriend, +, lanl : norman) . In the presented path, g^ — lanl:marko, 
g p , = lanl :hasFriend, g v v , = +, g\ = lanl: jhw, g P , = lanl :hasFriend, 
i?2" = +i an d .gf — lanl : norman. Note that <?q, = and g p ,, — 0. The example 
path is diagrammed in Figure [5j 



lanl:marko 



lankhasFriend 



lanl:jen 



lanhhasFriend 



lanl:norman 



Figure 5: An example of a g p path. 



The multi-set q p is an ordered set of vertices, edges, and directionalities 
that are recorded by p along its path through G n . The set q p maintains the 
same indexing schema of ' and " as g p . The main distinction between g p and 
q p is that q p is the returned path, not the actual path of p. If p reaches its 
destination rwr : ExitContext in 5" and thus vertex j € V, then the set q p is 
one of the elements in the return set Q of the path function p. Thus, for the 
grammar-based geodesic model, 

Q=U9 P: M^A =j). 

P eP 3 

The ^ q g -1 is necessary to transform the length of q p into an index in n time 
(due to the ' and " notation convention) because the set q p includes edge labels 
and edge directionality as well as vertices. 

4,2. Entering G n and ^ 

The initial walker p starts its journey at the rwr :EntryContext in ^ and 
the vertex i in V. Thus, g^ — i. As in Figure [3j the rwr : EntryContext 
must be the domain of the predicate rwr : f orResource whose range is i. An 
rwr : EntryContext must have no rwr : Attributes and must have the rule 
rwr : PathCount such that q p = i. 

From i £ V and the rwr : EntryContext in "J, p will move to some new k € V 
and some new rwr: Context in \F Before discussing the rwr: Traverse rule, it 
is necessary to discuss the attributes that determine the set of legal edges that 
can be traversed by p. 

4-.S. The rwr:NotEver Attribute 

The rwr : NotEver attribute is useful for ensuring that path loops do not 
occur and thus cause the path algorithm to run indefinitely. If p is trying 
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to traverse to a new rwr : Context at n + 1 and that rwr : Context has the 
rwr : NotEver attribute, then 



u 



The set X{p) n+ i is the set of vertices in V for which p cannot legally resolve 
the n + l rwr: Context to. Note that the definition of X(p) does not include 
edge labels or edge directionality, only vertices. This is due to the fact that the 
time index (71) of g p are not superscripted with ' or ". 

4-4- The rwr: Is Attribute 

The rwr: Is attribute guarantees that the vertex resolved to by a particular 
rwr: Context is a vertex seen on a previous step of the walker's g p . For in- 
stance, suppose that a walker must check that a particular individual works for 
the Los Alamos National Laboratory before traversing a different edge label of 
lanl : jhw. This problem is diagrammed in Figure [6] 



© 



n 



lanl:LANL 



lankworksFor 



lanl:worksFor 

1 



lanhjhw 



lanhmarko 



0© 



Figure 6: rwr: Is can be used to ensure that a walker backtracks. 



In Figure [6j the walker is at lanl: jhw at time step n = 1. At time step 
n = 2, the walker must check to see if lanl: jhw lanl:worksFor lanl: LANL. 
To do so, the walker will traverse lanl :worksFor edge. Upon validating the 
lanl: LANL, the walker must return back to lanl: jhw. Therefore, the walker 
will take the inverse of the lanl : worksFor edge (i.e. oppose the directionality 
of the edge). However, despite the existence of an inverse lanl : worksFor edge 
to lanl:marko, the walker should not clone itself. Therefore, in order to specify 
that the walker must return to lanl: jhw, it is important to use the rwr: Is 
attribute such that only a single walker p returns to lanl: jhw at n — 3 and P 
is unchanged. 

The set of all legal vertices that an rwr: Context can resolve to is defined 
by the set O, where if ip is the rwr : Context at n + 1 that maintains an rwr : Is 
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attribute, then 



M ={?m | (ip, rwr :hasAttributes, Ix) G * 
(?x,rwr:hasAttribute, ?y) G * 

rdf : type, rwr : Is) G $ 
(??/, rwr : step, ?m) G Vf'} 



and 



0(p) nH 



u & 



m£M 



The set O(p) C is the set of legal vertex resources that the n + 1 rwr : Context 
can resolve to and is used in the calculation of an rwr: Traverse at n. 



4-. 5. The rwr:Not Attribute 

The rwr : Not attribute determines the set of vertices that the n+1 rwr : Context 
cannot resolve to. This is similar to the X(p) set, except that it is for some n, 
not for all n in the past. For example, suppose that the walker must only con- 
sider an article co-authorship network. This problem is diagrammed in Figure 

m 



lanl:authored 



lanhjohan 



© 



doi:10.1007/ 
S1 11 92-006-01 76-z 



lanl:authored 

i 



lankmarko 



© 



Figure 7: rwr: Not can be used to ensure that a walker does not backtrack. 



In Figure[7J the walker must determine if the article doi : 10 . 1007/slll92-006-0176-z 
has at least 2 co-authors. In order to do so, the walker must not return to 
lanl: jbollen at n = 3. If 

M ={?m | (ip, rwr :hasAttributes, Ix) G * 
(?x,rwr:hasAttribute, ?y) G * 

rdf : type, rwr: Not) G * 
(??/, rwr : step, 7m) G ^} 

and 

X(j>)n+i = |J 9n- m , 

then X(p) G V is the set of vertices that the n + 1 rwr: Context ip must not 
resolve to and is used in the calculation of an rwr : Traverse at n. 
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4-6. The rwr: Traverse Rule 

The rwr : Traverse rule is perhaps the most important aspect of the gram- 
mar. An rwr : Traverse rule of an rwr : Context determines the next rwr : Context 
that p should traverse to in \P as well as the next k £ V. It utilizes the previously 
defined attribute sets X(p), 0(p), and X(p) in its calculation. An rwr : Traverse 
rule is composed of a set of rwr : Edges that can be either incoming or outgoing. 
Thus, unlike in directed networks, the path of a p is not constrained by the 
directionality of the edges. The T functions are defined as T : V x P — > G and 
t is the rwr: Traverse rule of the current rwr: Context ip. Therefore, if 

^out ={?2/ I (i,rwr:hasEdge,?y) £ * 
rdf : type, rwr: OutEdge) £ 

Y in ={?y | (t,rwr:hasEdge,?y) £ * 

rdf : type, rwr : InEdge) £ ^}, 

T+(a,p)= |J {<a,?w,?&) | (a,?w,?6)eG n 
yeVout 

A (y, rwr :hasPredicate, £ 

A ((?w,rdfs:subPropertyOf,?w) £ G n 

V ?cj =?«;) 

A (y, rwr :hasObject, ?x) £ * 
A (?x, rwr : f orResource, Iz) £ ^ 
A «?&, rdf :type, ?z) £ G n V ?& =?«) 
A (0(p) n+1 = V ?& £ 0(p) n+ i) 
A ?& £ X(p) n+1 Albi X(p) n+1 }, 

and 

r-(o,p) = (J {<?&,?w,a> | (?6,?w,a) £ G™ 

ye F in 

A (y, rwr :hasPredicate, ?w) £ <J> 

A ((?o;,rdfs:subPropertyOf,?w) £ G n 

V ?w =?to) 

A (y, rwr :hasSubject, ?x) £ ^ 
A (?x, rwr : f orResource, ?z) £ 
A ( (?&, rdf : type, ?z) £ G" V ?&=?*) 
A (O(p) n+1 -0V?6£O(p)„ +1 ) 
A?&£ X(p) n+1 A?&£* X(p) n+1 }, 
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then 

T(a,p) = r + (a,j))ur(fl,p), 

where T(a,p) is the set of legal edges that p can traverse given its current V 
location of a and *f> location tp. Note that the set T(a,p) has a unique set of 
elements. If T(a,p) — 0, then p halts. 

Unlike the grammar-based eigenvector model of |12j . the geodesic requires 
the searching of all legal paths. In line with a breadth-first search, all network 
branches are checked. Thus, for every triple r 6 T(a,p), a clone walker is created 
and added to P. This idea will be made more salient in the example to follow. 

4-7. The rwr:PathCount Rule 

The rwr : PathCount rule is the mechanism by which values in g p get ap- 
pended to q p , where q p is the path returned by p at the end of the algorithm's 
execution. The rule instructs p to append a path segment in g p to the ordered 
multi-set q p . If a particular rwr: Context ip has the rwr : PathCount rule with 
the rwr: step x such that x € N, then p will append S^_ x /, fln_ x "! an d 9n- x 
to q p such that none of the elements copied from g p = and they are added in 
their respective order. 

The next section will present the aforementioned rules and attributes within 
the framework of a particular social network ontology in order to demonstrate 
a practical application. 

5. Geodesies in a Semantic Social Network 

This section will present two examples of the previously presented ideas to 
the problem of calculating semantically meaningful geodesic functions within a 
semantic social network. Figure [8] presents an RDFS network ontology that will 
be used throughout the remainder of this section. Note that the domain and 
range of the properties are denoted by the tail and head of the edge, respectively. 




1~T 



lanl:hasFriend — 1 lanl:contacted 

Figure 8: An example semantic social network ontology. 



Figure [9] diagrams an example instance that respects the ontological con- 
straints diagrammed in Figure [S] 
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lanl:worksFor 



lanhjohan 
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lanl:REFR 



lankhasPosition 
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lanl:worksFor 
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lankhasPosition 



lankhasPosition 
If 



lankhasFn'end 



lanl:Researcher 



G 



n 



Figure 9: An example semantic social network instance. 



The first example will demonstrate how to determine all the non-recurrent 
paths between the vertex lanl: johan and lanl:norman such that only friend- 
ship paths are taken, but those intervening friend vertices must have a lanl : Researcher 
position. The second example will present a grammar that simulates an unla- 
beled network path calculation by ignoring vertex types and edge labels. 

Note that the two examples presented are for locating all paths between 
a source and a sink vertex. This is for demonstration purposes only. If one 
required only the shortest path, once a path between the source and sink has 
been found, the algorithm can halt. In unweighted networks, using a breadth- 
first search algorithm, the first path discovered is always the shortest path [29] . 

5.1. A Non- Recurrent Paths Grammar 

Figure [TU] presents a geodesic grammar that determines the set of all non- 
recurrent paths between lanl : j ohan and lanl : norman according to lanl : hasFriend 
relationships where every friend along the walker's path must be a lanl : Researcher. 

Note the diagrammatic conventions used to represent a grammar. Every 
rwr: Context, rwr:Rule, and rwr : Attribute has a after its type. This 
is to denote that each representation of the same rwr: Context, rwr: Rule, or 
rwr : Attribute is, in fact, a distinct vertex in "P. The label of the rwr : Context 
is the object of the rwr : f orResource property minus the Furthermore, the 
dashed contexts are rwr : EntryContexts and the dotted contexts are rwr : ExitContexts. 
Thus, lanl: johan_0 is the source context and lanl :norman_4 is the sink con- 
text in Vf, and where lanl: johan is the source vertex and lanl: norman is the 
sink vertex in G n . 
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rwr:Traverse_2 



lanl:hasPosition 
I 



lanl:Researcher_2 



rwr:Traverse_1 
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rwr:NotEver_1 
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f~ * lanl:hasFriend 

I lanl:johan_0 I 
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, lanl:norman_4 \ 



lNhashne 



lanhhasFriend 



rwr:PathCount_4 



"0" 





rwr:Traverse_3 




"2" 


rwr:PathCount_3 




lanl:Human_3 




rwr:ls_3 


"1" 



Figure 10: A grammar to determine all non-recurrent lanl :hasFriend paths from lanl: johan 
to lanl :norman. 



The rwr: Rules of an rwr: Context are represented in their order of execu- 
tion from bottom to top. The rwr : Attributes are associated, in no particular 
order, with their respective rwr: Context. If a rule or attribute requires a lit- 
eral rwr: step specification, that literal is appended to its respective rule or 
attribute. The + or - symbol on the head of an edge denotes whether the 
rwr: Traverse edge is an rwr:0utEdge or rwr : InEdge, respectively. 

At n = 0, 3q° = lanl: johan and P = {po}- The first rule to be ex- 
ecuted is the rwr : PathCount_0 rule in which po will register g^° in q p such 
that = 9o°- After adding lanl: johan to q Po , the walker will execute the 
rwr : Traverse_0 rule. The rwr :Traverse_0 rule yields a r(lanl: johan, po) = 
{(lanl : johan, lanl :hasFriend, lanl :marko)}. If lanl :norman was a friend of 
lanl: johan, then that edge would have been represented in T(lanl: johan, p ) 
as well. Because lanl : marko ^ g Po , the rwr : NotEver A attribute of the Human_l 
context has an A(p )i = 0. 

At n = 1, the current path of po is g Po = (lanl:johan, lanl :hasFriend, +, lanl : marko) 
and the current return path q Pa = (lanl: johan). There exists only one rule at 
rwr:Human_l. The rwr :Traverse_l rule dictates that po take an outgoing edge 
from lanl: marko to a lanl : Researcher position. Given that there is only one 

edge that can be traversed, r(lanl : marko, po) — {(lanl : marko, lanl :hasPosit ion, lanl : Researcher)}. 

At n = 2, the current path of p is g Po = (lanl : johan, lanl :hasFriend, +, 
lanl:marko, lanl :hasPosition, +, lanl : Researcher) and the current return 
path q Po = (lanl : johan). The only rule of the lanl :Researcher_2 context 
is to return the human that was last encountered as specified by the rwr: Is_3 
attribute of the next lanl:Human_3 context. Thus, T(lanl : Researcher, p ) = 
{(lanl :marko, lanl :hasPosition, lanl :Researcher)}. 

At n = 3, the current path of po is g Pa = (lanl: johan, lanl :hasFriend, +, 
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lanl :marko, lanl :hasPosition, +, lanl : Researcher, lanl :hasPosition, -, 

lanl :marko). Given the rwr : PathCount_3 rule with a rwr:step of 2, q Po — 

(lanl: johan, lanl :hasFriend, +, lanl : mar ko). The rwr :Traverse_3 rule 

provides a T(lanl :marko,p ) with two edges such that r(lanl :marko,p ) = 

((lanl :marko, lanl :hasFriend, lanl: jhw), (lanl:marko, lanl :hasFriend, lanl :norman)). 

Note that the edge (lanl :marko, lanl :hasFriend, lanl: johan) does not exist 

in r(lanl :marko,po) because of the rwr :NotEver_l attribute at the lanl :Human_l 

context (i.e. A(po)4 = {lanl : johan, lanl :marko}). Because two edges exist 

in T(lanl :marko,p ), Po is cloned such that P = {po,Pi}, g Po = g Pl , and 

qPo — qPi _ xhe walker p will take one edge and pi will take the other edge. 

At n = 4, pi will be at lanl : norman in G n and thus at an rwr : ExitContext 
in 4". However, before p\ halts, rwr :PathCount_4 is executed such that Q = 
{q Pl } = {(lanl: johan, lanl : has Friend, +, lanl:marko, hasFriend, +, lanl : norman)}. 
At the completion of rwr :PathCount_4 there are no other rules to execute and 
thus pi halts. The walker po, on the other hand, will be at lanl: jhw at n = 4. 
It is not until n = 7 that po arrives at lanl: norman. 

At n — 7, q p " = (lanl: johan, lanl : hasFriend, +, lanl:marko, lanl : hasFriend, 
+, lanl:jwh, lanl : hasFriend, +, lanl :norman). At n — 7, the grammar is 
complete and \Q\ = 2. 

The shortest path of Q is defined as the function s : F x F x >P -) N, where 

s(i,j,$)=mm (J ^g-^j- 

The 1 must be subtracted from \q\ in order to not include source vertex i as a 
step and then must be divided by 3 so as to avoid the inclusion of the edge label 
and directionality of the edge in the path length calculation. In the example 
presented, the shortest "researcher-constrained friendship" path is 2. From s, 
it is possible to generate all other geodesic functions as defined in Section [2j 

In the presented example, the source vertex is lanl : j ohan and the sink ver- 
tex is lanl : norman. It is noted that the rwr : EntryContext and rwr : ExitContext 
of ^ can be reconfigured to support new i and j source and sink vertices. In 
other words, ^ can be configured to support different i/j path calculations. 

5.2. A Grammar to Simulate Unlabeled Geodesies 

This section presents another example of the grammar-based geodesic al- 
gorithm. In this example, the grammar presented is equivalent to removing 
the edge labels and directionality from the semantic network and calculating 
a traditional geodesic metric on it. Figure [XT] presents the grammar where, in 
RDFS, rdfs : Resource is the base type of all resources (vertices and edge la- 
bels). Thus, all rwr: Contexts and rwr: Edges can legally resolve to any vertex 
and edge label, respectively. 

The grammar in Figure [TT] will determine the set of all non-recurrent paths 
between lanl : j ohan and lanl : norman such that any edge type can be traversed 
to any vertex type. The central rwr: Context is the rdf s :Resource_l context. 



19 



rdfs: Resource 



rdfs:Resource 



rwr:Traverse_1 



rwr:PathCount_1 



rdfs:Resource_1 



rwr:Traverse_0 



rwr:PathCount_0 



rwr:NotEver_1 



"0" 



■ rdfs:Resource 



I lanl:johan_0 I 
v J 



"0" 



rdfs:Resource 



+ ," 



, lanl:norman_2 , 



rwr:PathCount_2 



"0" 



Figure 11: An unconstrained grammar to determine all non-recurrent paths from 
lanl:jbollen to lanl:norman. 



A walker will loop over rwr : Resource_l until it can find an edge to make the 
final traversal to lanl :norman. Note the use of both rwr : OutEdges (+) and 
rwr: InEdges (-). With both edges accessible, the walker can walk in any direc- 
tion on the network. Thus, this grammar is equivalent to executing a geodesic 
on an undirected and unlabeled version of the semantic network. Finally, the 
grammar will produce no recurrent paths because of the rwr :NotEver_l rule. 

Given this and the original social network instance G n diagrammed in Fig- 
ure^ the shortest path between lanl : johan and lanl :norman is (lanl : johan, 
lanl : contacted, -, lanl : norman) with a path length of 1. To contrast, in the 
first example when the walker's path was constrained to researcher friendship 
relationships, the shortest path between lanl: johan and lanl: norman was 2. 



6. Analysis 

The semantic network is an unweighted network. Thus, determining the 
shortest path between any two vertices is best solved by a breadth-first algo- 
rithm. The grammar-based walker, through cloning, is analogous to a breadth- 
first search through the network. However, not all edges are considered by the 
walker and thus, the running time of the algorithm is less than or equal to 
0(|V| + The determination of the running time of the algorithm is gram- 

mar dependent. In order to calculate the running time of a particular grammar, 
it is important to calculate the number of vertices and edges of the grammar- 
specified types in G n . In the worst case situation, the walker population P will 
have traversed all vertices and edges from the source to ultimately locate the 
sink. However, because the network is unweighted, once the sink has been found 
by a single p e P, the shortest path has been determined so the algorithm is 
complete. 
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7. Computational Reuse with p-Encodings 



Once a computation has been performed, its results can be reused as a sub- 
solution to a larger problem. As stated previously, the path calculations between 
two vertices in a network are the kernel calculations for more complex path 
metrics such as shortest path, eccentricity, radius, diameter, closeness centrality, 
and betweenness centrality. This section will demonstrate how to encode the q p 
data structure into a semantic network such that the results of these calculations 
can be reused for each of the higher-order metrics. 

For instance, suppose the function / : N — > N, where f(n) = n + 1. Further- 
more, suppose that there exist the resources "l" AA xsd: int and "2" AA xsd: int 
such that there also exists the triple^] 

("l" AA xsd: int, /, "2" AA xsd: int). 

The triple states, in human language, that the number 1 is related to the number 
2 by the functional relationship /. If that triple is in G n , then never again would 
it be necessary to compute /(l) because the result has already been computed 
and has been represented in G n . Thus, G n can be queried for the result of the 
/(l) computation. For example, 

X = {?xl | ("l" AA xsd:int,/,?xl) <= G n } 

would return the result of /(l). However, this is a trivial example because it is 
faster to compute /(l) on the local hardware processor then it is to query G n 
for the solution. In other situations, this is not necessarily the case. 

For more complex computations, such as the set of paths between two ver- 
tices in V according to some 'I', it is possible to represent p and its associated 
data structure q p as a semantic network. Figure [P2| is a diagram of the RDFS 
ontology representing p and q p , where the noted components are considered 
either named graphs |30) . separate semantic network instances, or reified sub- 
networks [24] . From instances of this ontology, it is possible to reuse the path 
calculations to determine various geodesies without recalculating the ^-correct 
paths between any two vertices i and j. 

For example, given the q Pl path calculated in Section |5.1| the semantic 



network representation would be represented as diagrammed in Figure 13 The 
number of rwr: Segments is the largest rdf s : ContainerMembershipProperty 
(i.e. rdf : _3) for the rwr : Path. The path length of q Pl is thus, rdf : _3 - 1 
(i.e. 3 — 1). To make the mapping to the convention used in Section 5.1 more 



salient, note the rwr: Segment component labels at the bottom of the diagram. 

If the grammar-based path algorithm halts when it reaches an rwr : ExitContext, 
then every q p instance is a shortest path. While only the shortest path between 



3 The namespace prefix xsd is used to specify the data type of the quoted symbols. In this 
case, xsd: int refers to an integer data type. 
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Figure 12: Encoding p and its associated q p data structure in a semantic network. 
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Figure 13: An instance of the RDFS ontology in Figure [12] 



two vertices is required for geodesic metrics, the next subsections present the 
generalized algorithm for searching all q p paths between source vertex i and sink 
vertex j . 
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7.1. p- Encoded Shortest Path 

To compute the shortest path between two vertices i and j, where the com- 
plete set P is searched, the grammar-based shortest path algorithm is repre- 
sented as 



Xij ={1x2,7x5 | (7x1, rdf : type, rwr :GeodesicWalker) £p 



A 


(7x1, rwr 


usesGrammar, ty) E p 


A 


(7x1, rwr 


hasQPath, 7x2) £ p 


A 


(?x2,rdf 


.1,7x3) £ g p 


A 


(7x3, rwr 


hasVertex,i) £ q p 


A 


(?x4, rdf 


type, rwr :EntryContext) £ 'f 


A 


(7xA, rwr 


f orResource, i) £ <!/ 


A 


(7x2,7x5, 


7x6) £ g p 


A 


(7x6, rwr 


has Vertex, j) £ g p 


A 


(7x7, rdf : type, rwr: ExitContext) £ 


A 


(?x7, rwr 


f orResource, j) £ 



and the function min : rwr: Path x rdf s : ContainerMembershipProperty — > N 
returns the smallest value of the second component of its domain minus the 
rdf :_head. For example, ifJQj = {(rwr : Path_0, rdf : _4), (rwr : Path_l, rdf : _3)}, 
then min(Xi_j) — 3. The first rwr: Path element is used later when calculating 
the betweenness centrality of a vertex. 

The Xij query simply returns the path identifier and the number of seg- 
ments of each path between the rwr :EntryContext and the rwr : ExitContext. 
More specifically, the query that generates X^j can be understood, in human 
language, as saying: "Given the set of all rwr : GeodesicWalkers (7x1) that use 
as their grammar and who have a g-path (7x2) that has i as the vertex of 
the first (i.e. rdf :_1) rwr: Segment (?x3), where i is the rwr : EntryContext 
vertex of ^ (7xA) and who have j in a g-path rwr: Segment (7x6), where j is 
the rwr : ExitContext vertex of W (7x7), return the rwr: Path (7x2) and the 
rwr: Segment count (7x5) of the j rwr : Segment." 

7.2. p- Encoded Eccentricity, Radius, and Diameter 

Given the shortest path query, it is possible to generate other grammar-based 
geodesies. For instance, for eccentricity, 



s(i,j,^) = min(Xij) - 1, 



where 
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For radius, 



r(G n ,*) 



mm 



Finally, for diameter, 



max 



\ieV 



7.3. p- Encoded Closeness and Betweenness Centrality 
For closeness centrality, 



Finally, for betweenness centrality, if ms : rwr :Pathxrdf s : ContainerMembershipProperty — > 
rwr : Path, where ms returns the set of shortest paths in its domain and 
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hasVertex, i) G q p 
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(7x8, rwr 


hasVertex, A;) G q p 


A 


(?a;9,rdf 


type, rwr : ExitContext) G ^ 


A 


(7x9, rwr 


f orResource, k) G * 



A ?a;7 > ?x5 

A 7x2 G ras(I 3 - fc )} 

represents the set of shortest paths from j to k such that there exists some 
rwr : Segment in the rwr : Path that has i as its vertex, then 



To calculate the betweenness centrality of vertex i, it is important to know 
the number of shortest paths that go from j to k as well as the number of 
shortest paths that go from j to k through i. The function ms is used to 
determine which of those elements in Xj^ are shortest paths. The set Yj.k,i is 
then the set of all paths between j and k that go through i and are elements of 
ms(X Jyk ). 



c(*,*) 



1 
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8. Conclusion 



This article has presented a technique to port some of the most fundamental 
geodesic network analysis algorithms into the semantic network domain. There 
currently exist many technologies to support large-scale semantic network mod- 
els represented according to RDF. High-end, modern-day triple-stores support 
on the order of 10 9 triples [3TJ [35] ■ While many centrality algorithms are costly 
on large networks, by restricting the search to meaningful subsets of the full 
semantic network, as defined by a grammar, geodesic metrics can be reasonably 
executed on even the most immense and complex of data sets [331 134j . 

Acknowledgment s 

Marko A. Rodriguez is funded by the MESUR project (http://www.mesur.org 
which is supported by a grant from the Andrew W. Mellon Foundation. Marko 
is also funded by a Director's Fellowship granted by the Los Alamos National 
Laboratory. 

References 

[1] F. Harary, P. Hage, Eccentricity and centrality in networks, Social Networks 
17 (1995) 57-63. 

[2] L. C. Freeman, A set of measures of centrality based on betweenness, So- 
ciometry 40 (35-41). 

[3] A. Bavelas, Communication patterns in task oriented groups, The Journal 
of the Acoustical Society of America 22 (1950) 271-282. 

[4] M. A. Rodriguez, Mapping semantic networks to undirected networks, In- 
ternational Journal of Applied Mathematics and Computer Science 5 (1) 
(2008) 39-42. 

[5] U. Brandes, T. Erlebach (Eds.), Network Analysis: Methodolgical Founda- 
tions, Springer, Berling, DE, 2005. 

[6] M. A. Rodriguez, Social decision making with multi-relational networks 
and grammar-based particle swarms, in: Proceedings of the Hawaii Interna- 
tional Conference on Systems Science, IEEE Computer Society, Waikoloa, 
Hawaii, 2007, pp. 39-49. |doi : 10 . 1109/HICSS . 2007 . 487| 

[7] K. Anyanwu, A. Sheth, p-queries: Enabling querying for semantic associ- 
ations on the semantic web, in: Proceedings of the Twelfth International 
World-Wide Web Conference ACM, New York, NY, 2003, pp. 690-699. 
|doi : 10 . 1 145/775152 . 7752491 

[8] H. Zhuge, L. Zheng, Ranking semantic-linked network, in: Proceedings of 
the International World Wide Web Conference, Budapest, Hungary, 2003. 



25 



[9] S. Lin, Interesting instance discovery in multi-relational data, in: D. L. 
McGuinness, G. Ferguson (Eds.), Proceedings of the Conference on Inno- 
vative Applications of Artificial Intelligence, MIT Press, 2004, pp. 991-992. 

[10] B. Aleman-Meza, C. Halaschek- Wiener, I. B. Arpinar, C. Ramakrishnan, 
A. P. Sheth, Ranking complex relationships on the semantic web, IEEE 



Internet Computing 9 (3) (2005) 37-44. |doi : 10 . 1 109/MIC . 2005 . 63| 



[11] A. P. Sheth, I. B. Arpinar, C. Halaschek, C. Ramakrishnan, C. Bertram, 
Y. Warke, D. Avant, F. S. Arpinar, K. Anyanwu, K. Kochut, Semantic 
association identification and knowledge discovery for national security ap- 
plications, Journal of Database Management 16 (1) (2005) 33-53. 

[12] M. A. Rodriguez, Grammar-based random walkers in semantic net- 



works, Knowledge-Based Systems 21 (7) (2008) 727-739. |doi:10 . 1016/ 
|j . knosys . 2008 . 03 . 030| 

[13] P. Bonacich, Power and centrality: A family of measures., American Jour- 
nal of Sociology 92 (5) (1987) 1170-1182. 

[14] S. Brin, L. Page, The anatomy of a large-scale hypertextual web search 
engine, Computer Networks and ISDN Systems 30 (1-7) (1998) 107-117. 

[15] M. A. Rodriguez, J. Shinavier, Exposing multi-relational networks to single- 
relational network analysis algorithms, Journal of Informetrics 4 (1) (2009) 
29-41. |doi:10.1016/j . joi . 2009 . 06 . 004| 

[16] Y. Blanco- Fernandez, J. J. Pazos-Arias, A. Gil-Solla, M. Ramos-Cabrer, 
M. Lopez-Nores, J. Garcfa-Duque, A. Fernandez- Vilas, R. P. Di'az- 
Redondo, J. Bermejo-Mu noz, A flexible semantic inference methodology 
to reason about user preferences in knowledge-based recommender sys- 



doi : 



terns, Knowledge-Based Systems 21 (4) (2008) 305-320. |doi : 10 . 1016/ 
|j . knosys . 2007 . 07 . 004| 

[17] K. P. Chitrapura, S. R. Kashyap, Node ranking in labeled directed graphs, 
in: Proceedings of the Conference on Information and Knowledge Man- 
agement (CIKM'04), ACM, New York, NY, 2004, pp. 597-606. 
|10 . 1145/1031171 . 10312811 

[18] E. W. Dijkstra, A note on two problems in connexion with graphs, Nu- 
merische Mathematik 1 (1959) 269-271. 

[19] S. Wasserman, K. Faust, Social Network Analysis: Methods and Applica- 
tions, Cambridge University Press, Cambridge, UK, 1994. 

[20] H. J. Leavitt, Some effects of communication patterns on group perfor- 
mance, Journal of Abnornal and Social Psychology 46 (1951) 38-50. 

[21] G. Sabidussi, The centrality index of a graph, Psychometrika 31 (1966) 
581-603. 



26 



U. Brandes, A faster algorithm for betweeness centrality, Journal of Math- 
ematical Sociology 25 (2) (2001) 163-177. 

M. E. J. Newman, A measure of betweenness centrality based on random 
walks, Social Networks 27 (1) (2005) 39-54. 



F. Manola, E. Miller, RDF primer: W3C recommendation (February 2004) 



[cited November 2006]. 

URL |http:77 www . w3 . org/TR/rdf -primer/ 



D. Brickley, R. V. Guha, RDF vocabulary description language 1.0: RDF 
|schema[ Tech. rep ., World Wide Web Consortium (2004). 
URL ht tp : / /www . w3 . org/TR/ rdf - s chema/ 

J. F. Sowa, Encyclopedia of Artificial Intelligence, Wiley, 1987, Ch. Seman- 
tic Networks. 



T. Berners-Lee, R. T. Fielding, D. Software, L. Masinter, A. Systems, Uni- 
|forrn Resource Identifier (URI): Generic Syntax ) (January 2005). 
URL |http : //www . ietf . org/rf c/rf c2396 . txt| 



P. V. Biron, A. Malhotra, XML schema part 2: Datatypes second edition 



Tech. rep., World Wide Web Consortium (2004). 



URL htt p : //www . w3 . org/TR/xmlschem a-2/| 



T. H. Cormen, C. E. Leiserson, R. L. Rivest, Introduction to Algorithms, 
MIT Press, 1999. 

J. J. Carroll, C. Bizer, P. Hayes, P. Stickler, Named graphs, provenance and 
trust, in: Proceedings of the International World Wide Web Conference, 
ACM Press, Chiba, Japan, 2005, pp. 613-622. 

R. Lee, Scalability report on triple store applications, Tech. rep., Mas- 
sachusetts Institute of Technology (2004). 

C. Weiss, P. Karras, A. Bernstein, Hexastore: sextuple indexing for se- 
mantic web data management, Proceedings of the Very Large Database 
Endowment 1 (1) (2008) 1008-1019. |doi : 10 . 1145/1453856 . 1453965| 

J. Bollen, M. A. Rodriguez, H. Van dc Sompcl, L. L. Balakireva, A. Hag- 
berg, The largest scholarly semantic network... ever., in: Proceedings of 
the World Wide Web Conference, ACM Press, New York, NY, 2007, pp. 
1247-1248. |doi : 10 . 1 145/1242572 . 12427891 

J. Bollen, H. Van de Sompel, M. A. Rodriguez, Towards usage-based impact 
metrics: first results from the MESUR project., in: Proceedings of the 
Joint Conference on Digital Libraries, IEEE/ ACM, New York, NY, 2008, 
pp. 231-240. doi : 10 . 1 145 /137888 9 . 1378928| 



27 



